Getting Started with R(Studio)

Data Carpentry for Social Sciences and Humanities

2024-09-30

Why R?

It’s a programming language/software that is FREE and open source! 🎉

It was created by statisticians for statistics 📊

Because it’s FREE and open source and works with scripts, it’s great for reproducibility 💪

Did I mention FREE?

Ok… So why RStudio?

RStudio is an integrated development environment (IDE)

It’s essentially a (much prettier) wrapper for the R software

R is integrated into RStudio, so you never actually have to open R, which is a good thing…

Let’s take the tour

Organising working directory

Basics of R

R is a language spoken by the R software

Software can be very ‘dumb’

So we need to learn it’s language to communicate EXACTLY what we want

And like learning any language, this requires practice!

Things to look forward to:

Things to look forward to:

percent_items %>%
    ggplot(aes(x = village, y = percent, fill = items)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~ items) +
    theme_bw() +
    theme(panel.grid = element_blank(),
          legend.position = "none")

Things to look forward to:

Artwork by @allison_horst

Exercise 1

5 mins

Create two variables r_length and r_width and assign them values.

Create a third variable r_area and give it a value by multiplying r_length and r_width.

05:00

Solution

r_length <- 6
r_width <- 7
r_area <- r_length * r_width

Exercise 2

5 mins

Type ?round into the console to open the help page for the round() function.

Find the appropriate function to round 1.624 down to the nearest integer.

05:00

Solution

floor(1.624)
[1] 1

Exercise 3

10 mins

What will happen in each of the examples below?

💡 Hint: use typeof() to check the data type of your objects

num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2L, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")

Why does this happen?

10:00

Solution

Vectors can only contain a single data type.

R converts to a common denominator that loses as little information as possible.

character < double < integer < logical

num_char
[1] "1" "2" "3" "a"
num_logical
[1] 1 2 3 1
char_logical
[1] "a"    "b"    "c"    "TRUE"
tricky
[1] "1" "2" "3" "4"

Exercise 4

5 mins

How many values in combined_logical are "TRUE" (as a string)?

num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
combined_logical <- c(num_logical, char_logical)

05:00

Solution

combined_logical
[1] "1"    "2"    "3"    "1"    "a"    "b"    "c"    "TRUE"

The TRUE in num_logical gets converted to 1, and then "1" when combined with char_logical.

Exercise 5

10 mins

  1. Using this vector of rooms, create a new vector with the NAs removed:
rooms <- c(1, 2, 1, NA, 3, 1, 3, 2, 8, NA, 1)
  1. then calculate the median.

  2. Use R to calculate how many elements of rooms are larger than 2.

05:00

Solution

# 1
rooms_no_na <- rooms[!is.na(rooms)]
# or
rooms_no_na <- na.omit(rooms)

# 2
median(rooms, na.rm = TRUE) # or median(rooms_no_na)
[1] 2
# 3
rooms_above_2 <- rooms_no_na[rooms_no_na > 2]
length(rooms_above_2)
[1] 3